Goto

Collaborating Authors

 Guangzhou


Scale-teaching: Robust Multi-scale Training for Time Series Classification with Noisy Labels Peitian Ma South China University of Technology South China University of Technology Guangzhou, China

Neural Information Processing Systems

Deep Neural Networks (DNNs) have been criticized because they easily overfit noisy (incorrect) labels. To improve the robustness of DNNs, existing methods for image data regard samples with small training losses as correctly labeled data (small-loss criterion). Nevertheless, time series' discriminative patterns are easily distorted by external noises (i.e., frequency perturbations) during the recording process. This results in training losses of some time series samples that do not meet the small-loss criterion. Therefore, this paper proposes a deep learning paradigm called Scale-teaching to cope with time series noisy labels. Specifically, we design a fine-to-coarse cross-scale fusion mechanism for learning discriminative patterns by utilizing time series at different scales to train multiple DNNs simultaneously. Meanwhile, each network is trained in a cross-teaching manner by using complementary information from different scales to select small-loss samples as clean labels. For unselected large-loss samples, we introduce multi-scale embedding graph learning via label propagation to correct their labels by using selected clean samples. Experiments on multiple benchmark time series datasets demonstrate the superiority of the proposed Scale-teaching paradigm over state-of-the-art methods in terms of effectiveness and robustness.


Learning Representations for Time Series Clustering

Neural Information Processing Systems

Time series clustering is an essential unsupervised technique in cases when category information is not available. It has been widely applied to genome data, anomaly detection, and in general, in any domain where pattern detection is important. Although feature-based time series clustering methods are robust to noise and outliers, and can reduce the dimensionality of the data, they typically rely on domain knowledge to manually construct high-quality features. Sequence to sequence (seq2seq) models can learn representations from sequence data in an unsupervised manner by designing appropriate learning objectives, such as reconstruction and context prediction. When applying seq2seq to time series clustering, obtaining a representation that effectively represents the temporal dynamics of the sequence, multi-scale features, and good clustering properties remains a challenge.


Heatwave increases nighttime light intensity in hyperdense cities of the Global South: A double machine learning study

arXiv.org Artificial Intelligence

Heatwaves, intensified by climate change and rapid urbanisation, pose significant threats to urban systems, particularly in the Global South, where adaptive capacity is constrained. This study investigates the relationship between heatwaves and nighttime light (NTL) radiance, a proxy of nighttime economic activity, in four hyperdense cities: Delhi, Guangzhou, Cairo, and Sao Paulo. We hypothesised that heatwaves increase nighttime activity. Using a double machine learning (DML) framework, we analysed data from 2013 to 2019 to quantify the impact of heatwaves on NTL while controlling for local climatic confounders. Results revealed a statistically significant increase in NTL intensity during heatwaves, with Cairo, Delhi, and Guangzhou showing elevated NTL on the third day, while S\~ao Paulo exhibits a delayed response on the fourth day. Sensitivity analyses confirmed the robustness of these findings, indicating that prolonged heat stress prompts urban populations to shift activities to night. Heterogeneous responses across cities highlight the possible influence of urban morphology and adaptive capacity to heatwave impacts. Our findings provide a foundation for policymakers to develop data-driven heat adaptation strategies, ensuring that cities remain liveable and economically resilient in an increasingly warming world.


Advancing Math Reasoning in Language Models: The Impact of Problem-Solving Data, Data Synthesis Methods, and Training Stages

arXiv.org Artificial Intelligence

Mathematical reasoning remains a challenging area for large language models (LLMs), prompting the development of math-specific LLMs such as LLEMMA, DeepSeekMath, and Qwen2-Math, among others. These models typically follow a two-stage training paradigm: pre-training with math-related corpora and posttraining with problem datasets for supervised fine-tuning (SFT). Despite these efforts, the improvements in mathematical reasoning achieved through continued pre-training (CPT) are often less significant compared to those obtained via SFT. We investigate three primary research questions: (1) Can problem-solving data enhance the model's mathematical reasoning capabilities more effectively than general mathematical corpora during CPT? (2) Are synthetic data from the same source equally effective, and which synthesis methods are most efficient? Our findings indicate that problem-solving data significantly enhances the model's mathematical capabilities compared to general mathematical corpora. We also identify effective data synthesis methods, demonstrating that the tutorship amplification synthesis method achieves the best performance. Furthermore, while SFT facilitates instruction-following abilities, it underperforms compared to CPT with the same data, which can be partially attributed to its poor learning capacity for more challenging problem-solving data. To address the challenge of insufficient mathematical reasoning capabilities in large language models (LLMs), various math-specific LLMs are developed. These models generally follow a common training paradigm. During the pre-training stage, math-related corpora are filtered from extensive internet data to augment the model's mathematical knowledge. Work done during Zui Chen's internship at Guangdong Institute of Smart Education, Jinan University, Guangzhou, China. Model is available at https://huggingface.co/ai4ed/MathGPT-8B This enables the models to follow instructions and produce outputs in the desired format. Recently, there is a growing focus on constructing preference datasets for the solution process to perform Step-DPO (Lai et al., 2024) or online-RLHF (Dong et al., 2024). These approaches aim to obtain more accurate reasoning pathways, thereby significantly enhancing the mathematical reasoning capabilities of the models.


SADGA: Structure-Aware Dual Graph Aggregation Network for Text-to-SQL School of Computer Science, Guangdong University of Technology, Guangzhou, China

Neural Information Processing Systems

The Text-to-SQL task, aiming to translate the natural language of the questions into SQL queries, has drawn much attention recently. One of the most challenging problems of Text-to-SQL is how to generalize the trained model to the unseen database schemas, also known as the cross-domain Text-to-SQL task. The key lies in the generalizability of (i) the encoding method to model the question and the database schema and (ii) the question-schema linking method to learn the mapping between words in the question and tables/columns in the database schema. Focusing on the above two key issues, we propose a Structure-Aware Dual Graph Aggregation Network (SADGA) for cross-domain Text-to-SQL. In SADGA, we adopt the graph structure to provide a unified encoding model for both the natural language question and database schema. Based on the proposed unified modeling, we further devise a structure-aware aggregation method to learn the mapping between the question-graph and schema-graph. The structure-aware aggregation method is featured with Global Graph Linking, Local Graph Linking and Dual-Graph Aggregation Mechanism. We not only study the performance of our proposal empirically but also achieved 3rd place on the challenging Text-to-SQL benchmark Spider at the time of writing.


LSS-SKAN: Efficient Kolmogorov-Arnold Networks based on Single-Parameterized Function

arXiv.org Artificial Intelligence

The recently proposed Kolmogorov-Arnold Networks (KAN) networks have attracted increasing attention due to their advantage of high visualizability compared to MLP. In this paper, based on a series of small-scale experiments, we proposed the Efficient KAN Expansion Principle (EKE Principle): allocating parameters to expand network scale, rather than employing more complex basis functions, leads to more efficient performance improvements in KANs. Based on this principle, we proposed a superior KAN termed SKAN, where the basis function utilizes only a single learnable parameter. We then evaluated various single-parameterized functions for constructing SKANs, with LShifted Softplus-based SKANs (LSS-SKANs) demonstrating superior accuracy. Subsequently, extensive experiments were performed, comparing LSS-SKAN with other KAN variants on the MNIST dataset. In the final accuracy tests, LSS-SKAN exhibited superior performance on the MNIST dataset compared to all tested pure KAN variants. Regarding execution speed, LSS-SKAN outperformed all compared popular KAN variants. Zhijie Chen and Xinglin Zhang are with School of Computer Science and Engineering, South China University of Technology, Guangzhou, China. The rapid development of artificial intelligence (AI) is reshaping our world.


LLM+KG@VLDB'24 Workshop Summary

arXiv.org Artificial Intelligence

The unification of large language models (LLMs) and knowledge graphs (KGs) has emerged as a hot topic. At the LLM+KG'24 workshop, held in conjunction with VLDB 2024 in Guangzhou, China, one of the key themes explored was important data management challenges and opportunities due to the effective interaction between LLMs and KGs. This report outlines the major directions and approaches presented by various speakers during the LLM+KG'24 workshop.


Enhancing Campus Mobility: Achievements and Challenges of Autonomous Shuttle "Snow Lion''

arXiv.org Artificial Intelligence

Enhancing Campus Mobility: Achievements and Challenges of Autonomous Shuttle "Snow Lion" In recent years, the rapid evolution of autonomous vehicles (AVs) has reshaped global transportation systems. Leveraging the accomplishments of our earlier endeavor, particularly "Hercules" [1], an autonomous logistics vehicle for transporting goods, we introduce "Snow Lion", an autonomous shuttle vehicle meticulously designed to transform on-campus transportation, providing a safe and efficient mobility solution for students, faculty, and visitors. The main aim of this research is to improve campus mobility through a dependable, efficient, and eco-friendly autonomous transportation solution tailored to meet the diverse requirements of a university setting. This initiative significantly differs from the experiences of "Hercules" [1], as the campus environment presents a notable contrast to the structured environments of highways and urban streets. Emphasizing both security and passenger comfort, the primary focus is Figure 1: This figure illustrates the operational scenario of our on passenger transportation. Achieving this goal involves a autonomous shuttle during its service period at The Hong detailed examination of complex system designs that integrate Kong University of Science and Technology (Guangzhou) trajectory planning adjustments, prioritizing pedestrian safety (referred to as HKUST (GZ)).


Differentially Private Over-the-Air Federated Learning Over MIMO Fading Channels

arXiv.org Artificial Intelligence

--Federated learning (FL) enables edge devices to collaboratively train machine learning models, with model communication replacing direct data uploading. While over-the-air model aggregation improves communication efficiency, up-loading models to an edge server over wireless networks can pose privacy risks. Differential privacy (DP) is a widely used quantitative technique to measure statistical data privacy in FL. Previous research has focused on over-the-air FL with a single-antenna server, leveraging communication noise to enhance user-level DP . This approach achieves the so-called "free DP" by controlling transmit power rather than introducing additional DP-preserving mechanisms at devices, such as adding artificial noise. In this paper, we study differentially private over-the-air FL over a multiple-input multiple-output (MIMO) fading channel. We show that FL model communication with a multiple-antenna server amplifies privacy leakage when the multiple-antenna server employs separate receive combining for model aggregation and information inference. Consequently, relying solely on communication noise, as done in the multiple-input single-output system, cannot meet high privacy requirements, and a device-side privacy-preserving mechanism is necessary for optimal DP design. We analyze the learning convergence and privacy loss of the studied FL system and propose a transceiver design algorithm based on alternating optimization. Numerical results demonstrate that the proposed method achieves a better privacy-learning trade-off compared to prior work. The emergence of artificial intelligence (AI) applications that leverage massive data generated at the edge of wireless networks has attracted widespread interest [2], [3]. Federate learning (FL) is a popular paradigm for exploiting edge devices' data and computation power for distributed machine learning. FL coordinates the distributive training of an AI model on edge devices by periodically sharing model information with an edge server [4]. This work was supported in part by the General Research Fund (project number 14201920, 14202421, 14214122, 14202723), Area of Excellence Scheme grant (project number AoE/E-601/22-R), and NSFC/RGC Collaborative Research Scheme (project number CRS_HKUST603/22), all from the Research Grants Council of Hong Kong. The work of J. Y an was supported in part by the Guangzhou Municiple Science and Technology Project 2023A03J0011. Part of this work was presented at the IEEE Global Communications Conference (GLOBECOM), Kuala Lumpur, Malaysia, December 2023 [1]. He is now with the Department of Electrical and Computer Engineering at Cornell Tech, Cornell University, NY 10044, USA.


FedAL: Black-Box Federated Knowledge Distillation Enabled by Adversarial Learning

arXiv.org Artificial Intelligence

--Knowledge distillation (KD) can enable collaborative learning among distributed clients that have different model architectures and do not share their local data and model parameters with others. Each client updates its local model using the average model output/feature of all client models as the target, known as federated KD. However, existing federated KD methods often do not perform well when clients' local models are trained with heterogeneous local datasets. In this paper, we propose Federated knowledge distillation enabled by Adversarial Learning ( FedAL) to address the data heterogeneity among clients. First, to alleviate the local model output divergence across clients caused by data heterogeneity, the server acts as a discriminator to guide clients' local model training to achieve consensus model outputs among clients through a min-max game between clients and the discriminator . Moreover, catastrophic forgetting may happen during the clients' local training and global knowledge transfer due to clients' heterogeneous local data. T owards this challenge, we design the less-forgetting regularization for both local training and global knowledge transfer to guarantee clients' ability to transfer/learn knowledge to/from others. Experimental results show that FedAL and its variants achieve higher accuracy than other federated KD baselines. To address this problem, collaborative learning among multiple clients can be useful for producing models with better accuracy. However, there are several challenges. First, clients have their own local datasets and they may not be willing to share their raw data with others due to privacy concerns [1]. Second, clients on the edge of wireless networks often have different computation and memory resources, resulting in clients with heterogeneous models that have different architectures and parameters. Clients may not want to reveal their model architectures to other clients to further prevent privacy leakage [2], [3]. We refer to a client's model with unknown architecture to other clients as a black-box model. Pengchao Han is with the School of Information Engineering, Guangdong University of Technology, Guangzhou 510006, China. Han's contribution to this work was made when she was a Postdoc research associate at The Chinese University of Hong Kong, Shenzhen, China.